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(54) Tide: GENE EXPRESSION AND EVALUATION SYSTEM 
(57) Abstract 

An efficient and easy to use query system for a gene expression database. Using such a system, one can easily identify genes or 
expressed sequence tags whose expression correlates to particular tissue types. Various tissue types may correspond to different diseases, 
states of disease progression, different organs, different species, etc Researchers may now use large scale gene expression databases to full 
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GENE EXPRESSION AND EVALUATION SYSTEM 



CROSS-REFERENCE TO RELATED APPLICATIONS 
The present application claims priority from U.S. Ptov. App. No. 60/053,842 
filed July 25, 1997, entitled COMPREHENSIVE BIO-INFORMATICS DATABASE, from 
U.S. Prov. App. No. 60/069,198 filed on December 11, 1997, entitled COMPREHENSIVE 
DATABASE FOR BIOINFORMATICS , and from U.S. Prov. App. No. 60/069,436, entitled 
GENE EXPRESSION AND EVALUATION SYSTEM, filed on December 11. 1997. The 
contents of all three provisional applications are herein incorporated by reference. 

The subject matter of the present application is related to the subject matter of 
the following three co-assigned applications filed on the same day as the present application: 
METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE 
(Attorney Docket No. 018547-033810). METHOD AND SYSTEM FOR PROVIDING A 
POLYMORPHISM DATABASE (Attorney Docket No. 018547-033820), METHOD AND 
SYSTEM FOR PROVIDING A PROBE ARRAY CHIP DESIGN DATABASE (Attorney 
Docket No. 018547-033830). The contents of these three applications are herein incorporated 
by reference. 

BACKGROUND OF THE INVENTION 
The present invention relates to computer systems and more particularly to 
computer systems for analyzing expression levels or concentrations. 

Devices and computer systems have been developed for collecting information 
about gene expression or expressed sequence tag (EST) expression in large numbers of tissue 
samples. For example, PCT application WO92/10588. incorporated herein by reference for 
all purposes, describes techniques for sequencing or sequence checking nucleic acids and 
other materials. Probes for performing these operations may be formed in arrays according to 
the methods of, for example, the pioneering techniques disclosed in U.S. Patent 
No. 5,143,854 and U.S. Patent No. 5,571,639, both incorporated herein by reference for all 
purposes. 

According to one aspect of the techniques described therein, an array of 
nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescently 
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labeled nucleic acid is then brought into contact with the chip and a scanner generates an 
image file indicating the locations where the labeled nucleic acids bound to the chip. Based 
upon the identities of the probes at these locations, it becomes possible to extract information 
such as the monomer sequence of DNA or RNA. 
5 Computer-aided techniques for monitoring gene expression using such arrays 

of probes have been developed as disclosed in EP Pub. No. 0848067 and PCT publication 
No. WO 97/10365, the contents of which are herein incorporated by reference. Many disease 
states are characterized by differences in the expression levels of various genes either through 
changes in the copy number of the genetic DNA or through changes in levels of transcription 

10 (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.) of 
particular genes. For example, losses and gains of genetic material play an important role in 
malignant transformation and progression. Furthermore, changes m the expression 
(transcription) levels of particular genes (eg., oncogenes or tumor suppressors), serve as 
signposts for the presence and progression of various cancers. 

15 Information on expression of genes or expressed sequence tags may be 

collected on a large scale in many ways, including the probe array techniques described 
above. One of the objectives in collecting this information is the identification of genes or 
ESTs whose expression is of particular importance. Researchers wish to answer questions 
such as: 1 ) Which genes are expressed in ceils of a malignant tumor but not expressed in 

2 0 either healthy tissue or tissue treated according to a particular regime? 2) Which genes or 
ESTs are expressed in particular organs but not in others? 3) Which genes or ESTs are 
expressed in particular species but not in others?. 

Collecting vast amounts of expression data from large numbers of samples 
including all the tissue types mentioned above is but the first step in answering these 

2 5 questions. To derive full value from the investment made in collecting and storing expression 
data, one must be able to efficiently mine the data to find items of particular relevance. What 
is needed is an efficient and easy to use query system for a gene expression database. 

SUMMARY OF THE INVENTION 
30 An efficient and easy to use query system for a gene expression database is 

provided by virtue of the present invention. Using such a system, one can easily identify 
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genes or expressed sequence tags whose expression correlates to particular tissue types. 
Various tissue types may correspond to different diseases, state of disease progression, 
different organs, different species, etc. Researchers may now use large scale gene expression 
databases to full advantage. 

According to a first aspect of the present invention, a method is provided in a 
computer system for operating a database storing information about compound concentration. 
The method includes: providing a database including concentrations of a plurality of 
compounds as measured in a plurality of samples, accepting a user query to the database to 
identify desired ones of the plurality of compounds, the user query specifying concentration 
characteristics of the desired compounds in selected ones of the plurality of samples, and 
comparing the concentration characteristics to the concentrations stored in the database to 
identify the desired compounds. 

A further understanding of the nature and advantages of the inventions herein 
may be realized by reference to the remaining portions of the specification and the attached 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. I illustrates an example of a computer system that may be used to execute 
software embodiments of the present invention. 

Fig. 2 shows a system block diagram of a typical computer system. 

Fig. 3 is a flowchart describing steps of developing expression data according 
to one embodiment of the present invention. 

Fig. 4 is a flowchart describing steps of querying an expression database 
according to one embodiment of the present invention. 

Figs. 5A-5L depict a user interface for querying an expression database 
according to one embodiment of the present invention. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
Fig. 1 illustrates an example of a computer system that may be used to execute 
software embodiments of the present invention. Fig. 1 shows a computer system 1 which 
includes a monitor 3, screen 5, cabinet 7, keyboard 9, and mouse 1 1 . Mouse 1 1 may have 
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one or more buttons such as mouse buttons 13. Cabinet 7 houses a CD-ROM drive 15 and a 
hard drive (not shown) that may be utilized to store and retrieve software programs including 
computer code incorporating the present invention. Although a CD-ROM 17 is shown as the 
computer readable medium, other computer readable media including floppy disks, DRAM, 
hard drives, flash memory, tape, and the like may be utilized. Cabinet 7 also houses familiar 
computer components (not shown) such as a processor, memory, and the like. 

Fig. 2 shows a system block diagram of computer system 1 used to execute 
software embodiments of the present invention. As in Fig. 1, computer system 1 includes 
monitor 3 and keyboard 9. Computer system 1 further includes subsystems such as a central 
processor 50, system memory 52, I/O controller 54, display adapter 56, removable disk 58, 
fixed disk 60, network interface 62, and speaker 64. Removable disk 58 is representative of 
removable computer readable media like floppies, tape, CD-ROM, removable hard drive, 
flash memory, and the like. Fixed disk 60 is representative of an internal hard drive or the 
like. Other computer systems suitable for use with the present invention may include 
additional or fewer subsystems. For example, another computer system could 

include more than one processor 50 (i.e., a multi-processor system) or memory cache. 

Arrows such as 66 represent the system bus architecture of computer system 1 . 
However, these arrows are illustrative of any interconnection scheme serving to link the 
subsystems. For example, display adapter 56 may be connected to central processor 50 
through a local bus or the system may include a memory cache. Computer system 1 shown in 
Fig. 2 is but an example of a computer system suitable for use with the present invention. 
Other configurations of subsystems suitable for use with the present invention will be readily 
apparent to one of ordinary skill in the art In one embodiment, the computer system is an 
IBM compatible personal computer. 

The VLSIPS™ and GeneChip™ technologies provide methods of making and 

using very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. 

Patent No. 5,143,854 and PCT Patent Publication Nos. WO 90/15070 and 92/10092, each of 

which is hereby incorporated by reference for all purposes. Nucleic acid probes on the chip 

are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest 

(the "target" nucleic acid). 

It should be understood that the probes need not be nucleic acid probes but 
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may also be other polymers such as peptides. Peptide probes may be used to detect the 
concentration of peptides, polypeptides, or polymers in a sample. The probes should be 
carefully selected to have bonding affinity to the compound whose concentration they are to 

be used to measure. 

In one embodiment, the present invention provides methods of reviewing and 
analyzing information relating to the concentration of compounds in a sample as measured by 
monitoring affinity of the compounds to polymers such as polymer probes. In a particular 
application, the concentration information is generated by analysis ofhybridization intensity 
files for a chip containing hybridized nucleic acid probes. The hybridization of a nucleic acid 
sample to certain probes may represent the expression level of one more genes or expressed 
sequence tags (EST). The expression level of a gene or EST is herein understood to be the 
concentration within a sample ofmRNA or protein that would result from the transcription of 
the gene or EST. 

Expression level information that is reviewed and/or analyzed by virtue of the 
present invention need not be obtained from probes but may originate from any source. If the 
expression information is collected from a probe array, the probe array need not meet any 
particular criteria for size and density. Furthermore, the present invention is not limited to 
reviewing and/or analyzing fluorescent measurements of bondings such as hybridizations but 
may be readily utilized for reviewing and/or analyzing other measurements. 

Concentration of compounds other than nucleic acids may be reviewed and/or 
analyzed according to one embodiment of the present invention. For example, a probe array 
may include peptide probes which may be exposed to protein samples, polypeptide samples, 
or peptide r samples whWrhay or may not bond to the peptide probes. By appropriate 
selection of the peptide probes, one may detect the presence or absence of particular proteins, 
polypeptides, or peptides which would bond to the peptide probes. 

A system that designs a chip mask, synthesizes the probes on the chip, labels 
nucleic acids from a target sample, and scans the hybridized probes is set forth in U.S. Patent 
No. 5,571,639 which is hereby incorporated by reference for all purposes. However, the 
present invention may be used separately for reviewing and/or analyzing the results of other 
systems for generating expression information, or for reviewing and/or analyzing 
concentrations of polymers other than nucleic acids. 
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The term "perfect match probe" refers to a probe that has a sequence that is 
perfectly complementary to a particular target sequence. The test probe is typically perfectly 
complementary to a portion (subsequence) of the target sequence. The term " mismatch 
control" or "mismatch probe" refer to probes whose sequence is deliberately selected not to 
5 be perfectly complementary to a particular target sequence. For each mismatch (MM) control 
in an array there typically exists a corresponding perfect match (PM) probe that is perfectly 
complementary to the same particular target sequence. 

Among the important pieces of information obtained from the chips are the 
relative fluorescent intensities obtained from the perfect match probes and mismatch probes. 

1 0 These intensity levels are used to estimate an expression level for a gene or EST. The 
computer system used for analysis will preferably have available other details of the 
experiment including possibly the gene name, gene sequence, probe sequences, probe 
locations on the substrate, and the like. 

An expression analysis is performed for each gene for each experiment. Fig. 3 

IS is a flowchart describing steps of estimating an expression level for a particular gene as 
measured in a particular experiment on a chip. At step 302, the computer system receives 
raw scan data of N pairs of perfect match and mismatch probes. In a preferred embodiment, 
the hybridization intensities are photon counts from a fluorescein labeled target that has 
hybridized to the probes on the substrate. For simplicity, the hybridization intensity of a 

2 0 . perfect match probe will be designed "I^" and the hybridization intensity of a mismatch 

probe will be designed "loo,-" 

Hybridization intensities for a pair of probes are retrieved at step 304. The 
background signal intensity is subtracted from'eachof the hybridization intensities of the pair 
at step 306. Background subtraction can also be performed on all the raw scan data at the 
25 same time. 

At step 308, the hybridization intensities of the pair of probes are compared to 
a difference threshold (D) and a ratio threshold (R). It is determined if the difference between 
the hy bridization intensities of the pair (1^ - 1™,) is greater than or equal to the difference 
threshold AND the quotient of the hybridization intensities of the pair (1^ / l„J is greater 
30 than or equal to the ratio threshold The difference thresholds are typically user defined 
values mat have been determined to produce accurate expression monitoring of a gene or 
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genes. In one embodiment, the difference threshold is 20 and the ratio threshold is 12. 

If Ipm - Iffim >= D and 1^ / 1^ >= R, the value NPOS is incremented at step 
310. In general, NPOS is a value that indicates the number of pairs of probes which have 
hybridization intensities indicating that the gene is likely expressed. NPOS is utilized in a 
5 determination of the expression of the gene. 

Atstep312,itisdetennmedif^ Ifthese 
expressions are true, the value NNEG is incremented at step 3 1 4. In general, NNEG is a 
value that indicates the number of pairs of probes which have hybridization intensities 
indicating that the gene is likely not expressed. NNEG, like NPOS, is utilized in a 

1 o determination of the expression of the gene. 

For each pair that exhibits hybridization intensities either indicating the gene is 
expressed or not expressed, a log ratio value (LR) and intensity difference value (IDIF) are 
calculated at step 316. LR is calculated by the log of the quotient of the hybridization 
intensities of the pair (1^ / I^J. The IDIF is calculated by the difference between the 
15 hybridization intensities of the pair (1^ - I.J. If there is a next pair of hybridization 
intensities at step 3 1 8, they are retrieved at step 304. 

For each analysis performed certain data is stored in an expression analysis 
database. There is preferably a record for each gene or EST for which the chip measures 
expression. This record includes fields to hold various pieces of information. One field 

2 0 stores an analysis ID to identify the analysis. A result type ID field indicates whether the 

listed expression results indicate that the gene is present, marginal, absent, or unknown based 
on application of a decision matrix to the values PI, P2, P3, and P4. A number_positive field 
shows NPOS7 An number_negative fieldlfiows NNEG7A numbSJisedTield shows the 
number of probes belonging to pairs that incremented NNEG or NPOS. A number_all field 
25 indicates N. An average log ratio field indicates the average LR for all probe pairs. A 
number _positive_exceeds field indicates the value of NPOS - NNEG. A 
number_negative_exceeds field indicates the value of NNEG - NPOS. An average 
differential intensity field indicates the average EDIF for the probe pairs. A 
number_in_average field indicates the number of probe pairs used in computing the average. 

3 q Steps of operating a user interface to the expression database will now be 

illustrated with reference to Fig. 4. The steps of Fig. 4 may be repeated or may occur in a 
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different order, or one or more steps may be omitted. The discussion of the user interface 
will also refer to Figs. 5A-5L which depict representative screen displays of the user 
interface. 

At step 402, the user selects files of expression analysis results for querying. 
5 Fig. 5 A illustrates an interface screen where the user may specify expression results files. 
Each file represents one experiment A table 502 lists the files that have already been 
selected A given list may be saved for later use by selecting a button 504. A previously 
saved list may be deleted by selecting a button 506. A button 508 resets the list depicted in 
table 502 to a previously saved version. An import button 512 imports the contents of the 

1 0 files depicted in table 502 for querying. Within table 502, a file name column lists the file 
names that would be imported by application of import button 5 12. A code column indicates 
the tissue type for the expression data in each file. A replicate file indicates whether the file 
is a duplicate. A chip design code column indicates the chip design used to generate the data 
for the file. Various other columns (not shown) give further information about the analysis 

15 result data. 

By selecting a select files button 5 14, the user calls up a select files screen 5 1 6 
as shown in Fig. 5B. This provides an interactive file search and selection process that does 
not require typing in the file name. Before importing the file list, the user should select a 
species by using a species drop-down list 518 as shown in Fig. 5C. An analysis-type drop 
2 0 down list 5 1 9 allows the user to select between a relative expression analysis and an absolute 
expression analysis. 

Fig. 5D shows a normalization form 520 for normalizing imported expression 
results at step 404. The software scales the average difference data generated by the analysis 
routine based on the user's selections on normalization form 520. In a chip variability area 

2 5 522, the user specifies housekeeping genes with known expression levels and selects a scale 

value. The user can elect to either apply or not apply this scale value. If the user elects to 
apply the scale value, each gene expression level measured on a single chip is multiplied by a 
value equal to the desired scaling factor divided by the average of housekeeping expression 
levels measured on that chip. 

3 o Also on normalization form 520, in a tissue variability area 524, the user may 

select a scale value that applies to data collected from multiple chips and whether or not it is 
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applied. If this scale value is to be applied, each expression value measured in a chip set is 
multiplied by a factor equal to the scale value divided by die average expression level 
measured over all genes for the entire chip set A transformation area 526 allows the user to 
select whether negative average difference values are to be converted to positive numbers by 
5 use of a logarithmic transform. The user can reset all the changes made on normalization 
form 520 by selecting a reset button 528 or apply the selected normalizations and 
transformations by selecting an apply button 530. 

At step 406, the user filters the large set of experimental data that was 
imported, normalized, and transformed. Fig. 5E depicts a filter experiments form 532. A 

1 o lower table 534 lists the imported experiments and genes or EST and the expression data 

associated with each combination of experiment and gene or EST. An upper table 536 is 
used to enter a query to filter the experiment data in lower table 534. Each column of upper 
table 536 corresponds to a column in lower column 534. Upper table 536 is similar to a 
query by example (QBE) grid as included in Microsoft Access. Predicates are entered in the 
15 columns of upper table 536 with all the predicates in a single row treated as ANDs and those 
between rows treated as OR's. The results satisfying a given query are displayed in lower 
table 534 upon selection of a filter button 538. Filters may be saved, deleted, and reset by use 
of appropriately labeled buttons, 540, 542, and 544. A stored filter may be loaded by use of a 
drop-down list 546. Selection of an export button 547 writes the data to an Exel spreadsheet. 

20 

To facilitate further user queries, the user may specify a new field to be used 
as a pivot field for future queries at step 408. Elements of the selected field will become 
columns in the new table. Fig75F shows how a pivot value is selected by use of a drop-down 
list 548. The pivot value identifies the expression data that will be listed in the columns of 

2 5 lower table 534. Fig. 5G shows a pivot column drop-down list 550 allows selection of a 

particular column of lower table 534 as the pivot field. The entries of the selected column are 
shown in a left list box 552 and moved to a right list box 554 to include them as rows in the 
pivoted table. The user selects arrow keys 556 to add and delete items of right list box 554. 
To perform the pivot operation, the user selects a pivot button 558. 
30 Fig. 5H depicts a user interface for filtering tissue types as displayed as a 

result of the pivot operation. Lower table 534 shows the result of a pivot operation as 
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described with reference to Figs. 5F-5G. 

Upper table 536 is now used at step 41 0 to specify a query to filter genes using 
the results of experiments obtained from different tissue types. Again, predicates in a row are 
treated as ANDs. Predicates between rows are treated as ORs. By property formulating a 
query, the user may answer questions such as which genes are up-regulated in normal tissue 
and down-regulated in diseased tissue. The depicted Entrez definition column contains the 
definition column from the public domain Entrez database. The depicted query marked Tike 
"growth"' retains those records having the string "growth" as a substring in the designated 
column. 

One condition satisfying the depicted query is that a gene have an expression 
level in experiment 4002736D greater than 10 and an expression level in experiment 
4003228 A greater than 10 and less than 0.6 times the expression level in experiment 
4002736D. An alternate condition satisfying the query is that the expression level in 
experiment 4002736D be greater than 10 and the expression level in experiment 4003228A 
greater than 10 and greater than 1.4 times the expression level in experiment 4002736D. 

This query determines the genes that have a particular fold change pattern 
between experiment 4003228A and experiment 4002736D. It will filter out genes for which 
there is no significant fold change between the experiments. Specifically, it finds all genes 
for which the expression level of experiment 4003228A is less than 60% of the expression 
level of experiment 4002736D, or for which the expression level of experiment 4003228A is 
greater than 140% of the expression level of experiment 4002736D. Both experiments are 
also constrained to have expression levels greater than 10. 

Filters may be saved or reset by selection of buttons 560 and 562, respectively. 
The records displayed in lower table 534 may be sorted on any column(s), and columns may 
be hidden, frozen, or repositioned for better viewing. Lower table 534 may also be saved in 
different formats, including a spreadsheet format such as Microsoft Excel, by clicking on an 
export button 564. A saved filter may be accessed via a pull down menu 566 or deleted by 
selection of a delete button 568. Additional information on any gene may be obtained by 
double clicking its row. This will load an Internet browser program and open a web site such 
as the Entrez web site that stores information for the gene. The browser program then 
displays the entry for that gene. 
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At step 412, by selecting a graph button 570, the user calls up a scatter-plot 
display 572 depicted in Fig. 51. Two experiments are selected for comparison using drop- 
down lists 574 and 576 for the x axis and y axis respectively. The graph is generated by 
selecting a build scatter button 578. Each point on the scatter plot corresponds to a particular 
5 gene. The point is positioned on the graph according to its measured expression level in both 
experiments. By checking a box 580, the user may select to have the points color coded 
according to whether the gene was present in both (2P), one (IP), or neither (OP) of the 
experiments. By checking one or more of boxes 582, the user may elect to show or not show 
genes according to this categorization. 

10 By making an appropriate selection in a box 584, the user may select an 

interpretation for future mouse clicks. One choice is for the system to do nothing in response 
to a mouse click. Another choice is for the system to show gene data for a point selected by a 
mouse click. The gene data appears in a box 586 including the accession number, the gene 
name, -the expression levels as measured in a variety of experiments, and an expression call 

15 for each experiment (either absent or present) An Entrez definition name is also shown. 
Double clicking on an entry will invoke an Internet browser to show the Entrez entry for the 
gene. 

The user may also select "rope" in box 584 to collect interesting points for 
comparison by surrounding them with a polygon. Lines are automatically drawn between 
2 0 each mouse click, encircling those genes to be included in a bar graph. The user may display 
the bar graph by selecting a button 588. 

At step 414, Fig. 5 J depicts a bar graph 590 for the roped genes in the scatter 
plot of Fig. 51 Each grouping^ bars in Fig. 5 J corresponds to a gene. Each bar within a 
grouping corresponds to an experiment and is color-coded according to a legend 592. 

2 5 Initially only two experiments are displayed, the two experiments corresponding to the axes 

of the scatter plot of Fig. 51. However, the user may select further experiments from a box 
594. Once the desired experiments are selected, the user selects a build button 596 to display 
the desired bar graph. A table 598 shows the expression levels for the depicted genes. 

For the display of Fig. 5 J, the option "gene" is selected in a box 600. To view 

3 0 individual plots of the expression level for each gene as they vary over the experiments, the 

user may select option "experiment" in box 600 before selecting build button 596. This 
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produces a line graph 602 as shown in Fig. 5K. The experiments are arranged along the 
horizontal axis in the order specified in box 594. Each gene has its own trace corresponding 
to its expression level as it varies over the experiments. A legend 604 identifies the trace for 
each gene. To change the position of an experiment along the horizontal axis, me user uses 
5 up and down arrows 606 and 608 to change its position. This feature makes it possible to 
reorder the experiments to reflect additional sequencing knowledge. For example, if the 
experiments represent a time course such as progression of a disease or treatment, they can be 
graphically ordered in time sequence. The graph then represents the change in expression 
level as a function of time for the selected gene. A slider icon 612 allows the user to scroll 
10 along the horizontal axis if line graph 602 does not fit on the screen. A maker check box 614 
shows a horizontal line across line graph 602 defining a particular expression level. This 
allows the user to easily view data points above the selected level. 

More information about a gene may be obtained by clicking on any bar in the 
group. All of the information for the gene will be displayed in a separate window 610 as 

15 shown in Fig. 5L. 

In the foregoing specification, the invention has been described with reference 
to specific exemplary embodiments thereof. It will, however, be evident that various 
modifications and changes may be made thereunto without departing from the broader spirit 
and scope of the invention as set forth in the appended claims and their full scope of 
equivalents. For example, it will be understood that wherever "expression level" is referred 
to, one may substitute the measured concentration of any compound. Also, wherever "gene" 
is referred to, one may substitute the term "expressed sequence tag." 
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. WHAT TS CLAIMED IS; 

1 1. In a computer system, a method for operating a database storing 

2 expression level information comprising: 

3 providing a database comprising expression levels for each of a plurality of 

4 genes or expressed sequence tags (EST) as measured in each of a plurality of tissue types; 

5 accepting a user query to said database to identify desired ones of aid 

6 plurality of genes or EST, said user query specifying expression level characteristics of said 

7 desired genes; and 

8 comparing said expression level characteristics to said expression levels stored 

9 in said database to identify said desired genes or EST. 

1 2. The method of claim 1 farther comprising: 

2 displaying information identifying said desired genes or EST. 

1 3. The method of claim 1 wherein said plurality of tissue types comprise 

2 a diseased tissue type. 

1 4. The method of claim 1 wherein said plurality of tissue types comprise 

2 a healthy tissue type. 

1 5. The method of claim I wherein said plurality of tissue types comprise 

2 a cancerous tissue type. 

1 6. The method of claim I wherein said plurality of tissue types comprise 

2 a drug treated tissue type. 

1 7. The method of claim 1 wherein said plurality of tissue types comprise 

2 issues obtained from disparate species. 

1 8. The method of claim 1 wherein said plurality of tissue types comprise 

2 tissues obtained from disparate organs. 
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! 9. The method of claim 1 wherein said expression level characteristics 

2 comprise expression level ranges as measured for a particular gene in at least two of said 

3 plurality of tissue types. 

! 10. The method of claim 1 wherein said expression level characteristics 

2 comprise relationships among expression levels as measured for a particular gene in at least 

3 two of said plurality of tissue types. 

I 11. The method of claim 1 further comprising: 

o accepting user input selecting two of said plurality tissue types for graphical 

3 display; 

4 displaying a first axis corresponding to a first one of said two tissue types; 

5 displaying a second axis corresponding to a second one of said two tissue 

6 types; 

7 for a selected one of said plurality of genes or EST, displaying a mark at a 

8 position wherein said position is selected relative to said first axis in accordance with an 

9 expression level of said selected gene or EST measured in said first tissue type and selected 

10 relative to said second axis in accordance with an expression level of said selected gene or 

I I EST measured in said second tissue type. 

1 12. The method of claim 1 1 further comprising: 

2 repeating said operation of displaying a mark for a plurality of selected genes 

3 or EST. 

! 13. in a computer system, a method for operating a database storing 

2 information about compound concentration comprising: 

3 providing a database comprising concentrations of a plurality of compounds i 

4 measured in a plurality of Samples; 

5 accepting a user query to said database to identify desired ones of said 

6 plurality of compounds, said user query specifying concentration characteristics of said 
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7 desired compounds in selected ones of said plurality of samples; and 

8 comparing said concentration characteristics to said concentrations stored in 

9 said database to identify said desired compounds. 

1 14. A computer program product for operating a database storing 

2 expression level information comprising: 

3 code that provides a database comprising expression levels for each of a 

4 plurality of genes or expressed sequence tags (EST) as measured in each of a plurality of 

5 tissue types; 

6 code that accepts a user query to said database to identify desired ones of said 

7 plurality of genes or EST, said user query specifying expression level characteristics of said 

8 desired genes; 

9 code that compares said expression level characteristics to said expression 

1 0 levels stored in said database to identify said desired genes or EST; and 

1 1 a computer-readable storage medium for storing the codes. 

1 15. The product of claim 1 4 further comprising : 

2 code that displays information identifying said desired genes or EST. 

1 16. The product of claim 14 wherein said plurality of tissue types comprise 

2 a diseased tissue type. 

1 17. The product of claim 14 wherein said plurality of tissue types comprise 

2 a healthy tissue type. 

1 18. The product of claim 14 wherein said plurality of tissue types comprise 

2 a cancerous tissue type. 

1 19. Thcproduct of claim 14 wherein said plurality of tissue types comprise 

2 a drug treated tissue type. 
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1 20. Hie product of claim 14 wherein said plurality of tissue types comprise 

2 tissues obtained from disparate species. 

1 2 1 . The product of claim 14 wherein said plurality of tissue types comprise 

2 tissues obtained from disparate organs. 

j 22. The product of claim 14 wherein said expression level characteristics 

2 comprise expression level ranges as measured for a particular gene in at least two of said 

3 plurality of tissue types. 

1 23. The' product of claim 14 wherein said expression level characteristics 

2 comprise relationships among expression levels as measured for a particular gene in at least 

3 two of said plurality of tissue types. 

1 24. The product of claim 14 further comprising: 

2 code that accepts user input selecting two of said plurality tissue types for 

3 graphical display; 

4 code that displays a first axis corresponding to a first one of said two tissue 

5 types; 

6 code that displays a second axis corresponding to a second one of said two 

7 tissue types; 

8 code that, for a selected one of said plurality of genes or EST, displays a mark 

9 at a position wherein said position is selected relative to said first axis in accordance with an 

10 expression level of said selected gene or EST measured in said first tissue type and selected 

1 1 relative to said second axis in accordance with an expression level of said selected gene or 

1 2 EST measured in said second tissue type. 

1 25. The product of claim 24 further comprising: 

2 code that repeatedly applies said code that displays a mark for a plurality of 

3 selected genes or EST. 
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1 26. A computer program product for operating a database storing 

2 information about compound concentration comprising: 

3 code that receives a database comprising concentrations of a plurality of 

4 compounds as measured in a plurality of samples; 

5 code that accepts a user query to said database to identify desired ones of aid 

6 plurality of compounds, said user query specifying concentration characteristics of said 

7 desired compounds in selected ones of said plurality of samples; and 

8 code that compares said concentration characteristics to said concentrations 

9 stored in said database to identify said desired compounds. 

1 27. A computer system comprising: 

2 a processor; and 

3 a memory storing code to operate said processor, said code comprising: 

4 code that provides a database comprising expression levels for each of a 

5 plurality of genes or expressed sequence tags (EST) as measured in each of a plurality of 

6 tissue types; 

7 code that accepts a user query to said database to identify desired ones of said 

8 plurality of genes or EST, said user query specifying expression level characteristics of said 

9 desired genes; and 

I o code that compares said expression level characteristics to said expression 

I I levels stored in said database to identify said desired genes or EST. 
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